Probabilistic Entity Linkage for Heterogeneous Information Spaces

نویسندگان

  • Ekaterini Ioannou
  • Claudia Niederée
  • Wolfgang Nejdl
چکیده

Heterogeneous information spaces are typically created by merging data from a variety of different applications and information sources. These sources often use different identifiers for data that describe the same real-word entity (for example an artist, a conference, an organization). In this paper we propose a new probabilistic Entity Linkage algorithm for identifying and linking data that refer to the same real-world entity. Our approach focuses on managing entity linkage information in heterogeneous information spaces using probabilistic methods. We use a Bayesian network to model evidences which support the possible object matches along with the interdependencies between them. This enables us to flexibly update the network when new information becomes available, and to cope with the different requirements imposed by applications build on top of information spaces.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Entity linkage for heterogeneous, uncertain, and volatile data

A plethora of collections is nowadays created by merging data from a variety of different applications and information sources. These sources often use different identifiers for data that describe the same real world object, for example an artist, a conference, an organization. The large number of existing entity linkage approaches are not designed for the characteristics of modern applications...

متن کامل

Probabilistic Linkage of Persian Record with Missing Data

Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...

متن کامل

Totally probabilistic Lp spaces

In this paper, we introduce the notion of probabilistic valued measures as a generalization of non-negative measures and construct the corresponding Lp spaces, for distributions p > "0. It is alsoshown that if the distribution p satises p "1 then, as in the classical case, these spaces are completeprobabilistic normed spaces.

متن کامل

A COMMON FRAMEWORK FOR LATTICE-VALUED, PROBABILISTIC AND APPROACH UNIFORM (CONVERGENCE) SPACES

We develop a general framework for various lattice-valued, probabilistic and approach uniform convergence spaces. To this end, we use the concept of $s$-stratified $LM$-filter, where $L$ and $M$ are suitable frames. A stratified $LMN$-uniform convergence tower is then a family of structures indexed by a quantale $N$. For different choices of $L,M$ and $N$ we obtain the lattice-valued, probabili...

متن کامل

On-the-Fly Entity-Aware Query Processing in the Presence of Linkage

Entity linkage is central to almost every data integration and data cleaning scenario. Traditional techniques use some computed similarity among data structure to perform merges and then answer queries on the merged data. We describe a novel framework for entity linkage with uncertainty. Instead of using the linkage information to merge structures a-priori, possible linkages are stored alongsid...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008